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Abstract 

"Asymptotic formulae for likelihood-based tests of new physics" presents a mathemat- 
ical formalism for a new approximation for hypothesis testing in high energy physics. The 
approximations are designed to greatly reduce the computational burden for such prob- 
lems. We seek to test the conditions under which the approximations described remain 
valid. To do so, we perform parallel calculations for a range of scenarios and compare the 
full calculation to the approximations to determine the limits and robustness of the ap- 
proximation. We compare this approximation against values calculated with the Collie 
framework, which for our analysis we assume produces true values. 
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1 Introduction 



One of the primary goals in experimental particle physics is the search for new particles. 
In order to determine whether or not a particle has been discovered statistical hypothesis 
tests are used. The probability of finding an outcome as extreme as the one observed can be 
compared to a predetermined threshold to ascertain whether or not discovery has occured. 

Unfortunately, due to the sheer magnitude of the amount of data involved in the search for 
the new particles, determining probabilities is often computationally intensive. In this paper 
we examine the approximation presented in "Asymptotic formulae for likelihood-based tests 
of new physics," to find the limits of its applicability. This approximation is evaluated to 
determine when it successfully reproduces the results from a full semi-frequentist computation 
with no approximations (Section S]) . Conclusions based on these findings are presented in 
Section [5l 

Presented below is the necessary prerequisite knowledge; this includes general statistics 
(Section 12. ip . such as hypothesis testing and the likelihood ratio (Section 12. 2p . as well as 
an explanation of how these techniques are used in particle physics (Section 12. 4p . We then 
explain the mathematical basis for the Asimov data set based upon results from Wilks and 
Wald (Section [3|), as given by the authors of [1]. The Asimov data set is a representative 
set of values that theoretically represents the true parameters of the full ensemble. This 
set contains represents an ensemble of simulated data; later it is described in greater depth 
(Section [3]). Henceforth the three approximations together will be abbreivated as the AWW 
approximation, an acronym of their names. This allows us to examine the possibility that 
the approximation generates valid parameters. The approximation, the full mathematical 
formalism and subsequent evidence are presented in ( arXiv: 1007. 1727 y2) . upon which our 
explanation and formalism are based [l]. 

2 Mathematical Formalism 

Presented here are some basic statistical principles, such as hypothesis testing and test statis- 
tics, as well as more complex ideas like the likelihood function and it's application to binned 
data. This section ends with a brief overview of statistical methods used in particle physics. 

2.1 Basic Statistics 

A hypothesis is a suggested solution to explain a given phenomenom. One often compares the 
validity of two hypotheses through statistical testing, where one decides whether a given null 
hypothesis, Hq, should be rejected in favor of the alternate hypotheses. Hi. In particle physics 
the null hypothesis typically contains all known processes and the alternate hypothesis may 
also contain a new process or particle. Meaning, the null hypothesis would be background- 
only and the alternate hypothesis would then be signal-plus-background. 

A test statistic is a function of the sample and assumed to be a numerical summary of the 
data that can be used to reject, or fail to reject, a hypothesis. This can be done by calculating 
the probability of obtaining a test statistic as extreme as the one observed, which is called 
a p-value. This represents the level of agreement between the data and a single hypothesis. 
The p- value can be measured against a significance level a, defined as the critical p- value; 
i.e. p must be less than or equal to a to reject a given hypothesis. 
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The p-value can also be converted to a standardized value, such as a Z-score, the number of 
standard deviations a datum is from the mean; Z is given as a function of p by 

Z{p) = ^-^{\-v). (1) 

where <I>~^(p) = \'^Erf~^{2p — 1), the quantile of the standard Gaussiar0. At a = 0.05, a 
commonly used signficance level, the Z-score is equal to 1.64 for a one-sided test; a one-sided 
test is used when the critical outcomes capable of rejecting a hypothesis occur on only one side 
of the distribution. Because we can distinguish between positive and negative fluctuations in 
our tests we use a one-sided test, with a 95% confidence level (CL) exclusion. 



2.2 The Likelihood Function and Maximization 

The likelihood of a given observation given a set of parameters is equal to the probability of 
a set of parameter values given an observation. 

Consider a set of N observables, contained in x = (xi, xat), described by probability 
distribution function (p.d.f.) f{x;0), where 6 = {9i,...,6n) are the unknown parameters, 
which also known as the nuisance parameters. Assuming statistical independence between 
the measurements Xi, then the likelihood function L(0) is 

N 

m = l[fixi;e). (2) 

i=l 

The 6 values that maximize this function are denoted 6. In order to find the maximum 
likelihood (ML) estimators one can solve the formula [6] 

dlnL 

^ = 0, ^ = l,...,n. (3) 

The covariance matrix of the ML estimators, Vij = cov[9i,9j] can be used to estimate the 
standard deviation, a. We can find this by first finding the inverse covariance matrix, which 
can be approximated as 



(4) 



and then invert the resulting matrix to find the standard deviation. This is also known as 
the curvature matrix, and can only be used when the positive and negative deviations are 
equal. 



^Erf is the error function. Erf-^(z) = V tYJL^fk+i 

fc-1 

, c„iCfe_i_,„ , 7 127, 

where ct = > = 1, 1, — , k ... 

^(m + l)(2m + l) ^ ' ' 6 90 ^ 

m— 
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2.3 Likelihood Approximation for Binned Data 

If a sample size is large it is often easier bin the data into a histogram. This results in a 
vector n = (ni, ...,n7v) with expectation value v = E\jn\ and p.d.f. /(n, i^). Maximizing the 
likelihood ratio is equivalent to minimizing the quantity — 21n A(0). For independent, Poisson 
distributed rii this quantity is [5] 



N 

-21nA(0) = 2^ 

i=l 

where the last term is zero when Ui = 0. According to Wilks' theorem, for sufficiently 
large samples that meet certain regularity conditions, the minimum of Eq. ([5]) follows a 
distribution, allowing the usage of goodness-of-fit tests [2]. 



Vi{G) -ni + njln 



Me). 



(5) 



2.4 Particle Physics Statistics 

This subsection describes how the forementioned statistical principles are often applied in 
particle physics. In particle physics a Z-score greater than or equal to 5, ox p = 2.87 x 10~^ 
for a one-sided tail, is usually required for discovery, which results from the rejection of the 
background-only hypothesis. 

For binned data with a histogram of variable x and information n = (ni, riTv), the 
expectation value 

E[ni] = fisi + bi, (6) 

where ^ is the signal strength, and Si and bi are the mean number of entries in the ith bin, 
meaning [1] 



Si = stot / fs{x;Os)dx , 

Jhini 



bi = 6tot / fb{x;eb)dx. 

Jhini 



(7) 
(8) 



Here fsix; Og) and fbix; Ob) are the p.d.f.s of the variable x for signal and background events 
respectively. The signal strength is equal to zero for the background-only hypothesis and 
one for the nominal signal hypothesis. Henceforth, contains all nuisance parameters, i.e. 
9 = ^bi fttot); Stot is not contained in 6 because it's value is fixed by the prediction from 
the nominal signal hypothesis. 

One can create a control sample that measures only background events, with information 
contained in histogram m = (m-i, ...,mjv/) the expectation value of rui is 



E[mi] = u,{e), (9) 

where Ui is dependent on the nuisance parameters. The purpose of the control sample is to 
add useful constraints to the nuisance parameters. 
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Using the signal-plus-background and background-only information, the likelihood func- 
tion can be written as a product of two Poisson probabilities 



L(^,0) = TTl^ffl±^e-(M«.+M TT^f^e-"^. (10) 



j=i '^i' fe=i 
The test statistic we are interested in is 7 = — 21nA(;^), where 



is the profile likelihood ratio. Here 6 denotes the conditional maximum-likelihood estimator 
for the specified /i; jl and 6 are the unconditional maximum-likelihood estimators. 

Assigning our value as 7', we can calculate the p- value from 



p{^^) = / fiiWt, (12) 

where /(7I/X) is the p.d.f. of 7 for the given signal strength /i [1]. 

3 The Asimov Data Set Approximation 

The conditional definition of the Asimov data set is that when one uses it to evaluate the 
estimators for all parameters one obtains the true parameter values, i.e. it represents the 
maximum likelihood for the parent p.d.f. In order to test if the Asimov condition holds one 
can use the generic likelihood function Eq. ([2]). Using the simplified notation vi = fi'si + bi, 
and setting 6q = /i, then Eq. ^ becomes 

^^E(|-i)||.E(^-Oij-- (-) 

If Tii^A = E{ni) and rrii^A = E[mi\, where the subscript A denotes Asimov values, then the 
Asimov condition is met. We cannot calculate the Asimov likelihood La because it contains 
factorial dependence on Asimov values that can be non-integer. However, these factorials are 
canceled in the Asimov profile likelihood ratio 

= 7^ = LMiAT) • 

where the substitution in the denominator of the final equality is allowed by the definition of 
the Asimov data set [1]. 
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3.1 The Wald Equation 



Suppose we have a test with strength parameter fj, and the data is distributed by strength 
parameter ^ , then according to Wald [3] 

-21nA(M) = +0(l/ViV) , (15) 

where N is the sample size and /i is a Gaussian distribution with mean n'. Here a is found 
using the covariance matrix. 

Substituting the Asimov data set with strength parameter ^' into the Wald approximation 
equation, it follows from Eq. (fT5]) that 

-21nAA(^)^ (/^- m)' (Ig) 

for large samples. We provide an alternate way to find the standard deviation via the Asimov 
data set, defining Qf^^A = — 21nA^(^), 

cr^ = • (17) 

To find the median exclusion significance assuming there is no signal /i' = 0, Eq. ()17p reduces 
to 

a\ = f- (18) 
Similarly for the case of discovery where /i = 0, Eq. PT|) is 

4 = (19) 

3.2 The Tevatron Test Statistic 

The test statistic 

q = -2ln^, (20) 

is often used in analyses at the Fermilab Tevatron Collider. Here Ls+b is the nominal signal 
model with strength parameter = 1, and is the background-only hypothesis with = 0. 
Rewriting Eq. ([20l) . 

g = -21n ^ ^'^^-^^^ = -21nA(l) +21nA(0). (21) 
L(/i = 0,0(0)) 

If the Wald appromixation holds, then 
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A' 1-2A .... 

g= 2 2= 2~- V22) 

CT^ (J^ O"^ 

Since jl is Gaussian and q is dependent on // then q is also Gaussian. Therefore, the expec- 
tation value and standard deviation of q are [1] 



E[q] = (23) 



(24) 



ASince g is Gaussian we can use the cumulative distribution functiorf^ to determine the 
p-value. Plugging in what we know of the signal strengths of the two hypotheses, as well as 
the mean and standard deviation of qjl 



Ps+b= I ms + b)dq = l-^{ ''"\-''^'''A , (25) 



r fiq\s + b)dq = l-^h-^ 

Jqobs \ 2/c7s 



-b 



lobs 



2' 



Qobs -'^/crt 



P. =y_ =*^=^j. (26) 

4 Pseudo-data Tests 

In order to test if the AWW approximation reproduces the real distributions of 7 we created 
a set of test data, applied various systematic uncertainties and compared with the values 
produced by the Collie framework. We calculate the signal strength required to achieve a 
given significance level in both models and compare. 

The pseudo-data generated has least likelihood ratios similar to a set of Tevatron data 
by construction, and is displayed in Fig. [H We define the data as equal to the background 
before systematic uncertainties. 

The Collie software suite generates semi-frequentist confidence intervals with an output 
designed for Root [1]. Here we will consider the Collie confidence level value to be true 
for the sake of evaluating the Asimov conditions. Collie also outputs the observed, signal 
plus background, and background-only least likelihood ratios, which are used to calculate the 
AWW approximation. 

From Eq. (llSp . with n = 1 from the nominal signal hypotheses we have 

^.\m = r^- (27) 

Qs+b 

Substituting this value into Eq. ([25]) 

^For a normal variable with mean fi, variance and observation x the cumulative distribution function is 
$(£^)= i[l + er/(^)] 

^The original paper contains confusing notation and a substitution error in their derivation; the formulas 
presented here are correct. 
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Final Variable Final Variable 



Figure 1: On the left is a plot of the test input data generated in root with 1,000,000 events placed 
into 1500 bins. The background was filled with an exponential decay function, e^^ , with r = 0.203 
and weight 135000/event count, and the signal was filled by a mirrored exponential decay function, 
1 — e'^^, with T = 0.215 and weight 214/event count. For ease of intepretation the signal is displayed 
with a scale factor of 500 and the y-axis is linear. This histogram has 1500 bins and 1,000,000 events, 
which are used in this paper unless stated otherwise. On the right we display the ratio of signal over 
background. 



which provides a simple calculation of the AWW approximation using the Collie output. 
We report results in terms of a ratio; this ratio is always the approximation value divided by 
the Collie value. We keep this standard because the Wald approximation should result in 
underestimation, thus the ratio should stay below one. 



4.1 Background-only Rate Systematic Uncertainty 

The first systematic uncertainty applied was a rate systematic uncertainty on only the back- 
ground. Our results are plotted in Fig. ([2]). As expected we see no discrepancy when there is 
no uncertainty, i.e. when the background rate systematic uncertainty is set at 0%, meaning 
the data and background are equal. 
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Figure 2: This experiment measured the two 
methods against each other while they accounted 
for a background-only rate systematic uncertainty 
that varied from zero to fifty percent in five per- 
cent increments, (a) Shows the signal scale nec- 
essary to achieve the CL for both methods, (b) 
shows the ratio of the approximation value over 
the Collie value, (c) shows the fractional uncer- 
tainty of the background at 25% uncertainty in 
the rate systematic uncertainty. 



As we apply the rate systematic uncertainty we get up to around 5% deviation from the 
"true" value, as well as no obvious trend as a function of systematic uncertainty percent. 
Therefore the AWW approximation is valid. 

4.2 Signal and Background Rate Systematic Uncertainties 

Perhaps the most striking results were the three dimensional plots where the axes in the hori- 
zontal plane represent the percent rate systematic uncertainties of the signal and background 
histograms. We created plots of both uncorrelated and correlated systematic uncertainties. 
No systematic uncertainty plots are shown as they are equivalent to the systematic uncer- 
tainty plot in the background rate systematic uncertainty, only now applied to signal as well 
as background. 

Fig- ([3]) displays the uncorrelated data set and Fig. the correlated. With relatively flat 
signal scale ratios at the C.L. we conclude that the AWW approximation is valid for these 
systematic uncertainties. 
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Figure 3: The horizontal plane contains the axes for the background and signal rate systematic uncertainties 
percent, varying from 5% to 45% in 10% increments. The variable we are interested in is the ratio of the 
AWW approximation over the Collie value, which is presented in the z-axis. The color is a gradient and the 
scale is held in this manner for comparison to the correlated plot. 




Figure 4: The axes, scale and confidence level are equivalent to that of Fig. ([3]). These plots appear fairly 
similar 

4.3 Asymmetric Gaussian "Flat" Systematic Uncertainties 

For the next experiment we ran two tests with a flat systematic uncertainties with a discon- 
tinuity at the center. Fig. (0) shows the way in which Collie approximates a solution for 
an asymmetric Gaussian as well as the systematic uncertainty itself. The first test had the 
positive systematic uncertainty constant and the negative varied, while the second reversed 
the roles. 

One notable difference here from the other tests is that we had to use the observed Collie 
confidence level instead of the calculated median, which results in slightly greater random 
variability. This is due to the systematic uncertainty being non-Gaussian. 

Both sets were run from 0% to 50% on the uncertainty that varies, but are only plotted 
up to 35%. This is because the data at and above 35% return unusable values due to a 
failure in the AWW approximation. This occurs because at this level and type of systematic 
uncertainty the histograms are no longer Gaussian. When there is 5% negative and no 
positive uncertainty the AWW approximation overestimates the value. Other than this, at 
low uncertainty differences the AWW approximation is still valid, however above 35% on the 
varying systematic uncertainty it is invalid as the model breaks down. 
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Figure 5: The plot in (a) shows the collie approxmation to an asymmetric Gaussian, (b) shows the 
flat systematic uncertainty at negative fluctuations of 5% and positive fluctuations of 10%. 




Figure 6: Here the negative side of the flat systematic uncertainty was held at 5% while the positive 
varied from 0% to 35%. (a) The signal scale necessary required is 90% confidence, (b) shows the ratio 




Figure 7: This displays the same information as Fig. ([6]) except for this experiment the positive 
component is held fixed while the negative is varied. 
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4.4 Uncertainty on Background Shape 



Next, we tested the resilience of the AWW approximation against deviations in the tau of 
the exponential decay function of the background. The initial r value we used, 0.203, was 
chosen in order to simulate the least likelihood ratio values found in a set of real Tevatron 
data (this is also true for the case of the signal tau formula, where r = 0.215). Fig. (I4.4p 
displays these findings. 
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Figure 8: A background-only shape 
systematic uncertainty with the r value 
is deviated from 0% to 10%, for both 
positive and negative directions, (a) 
Again 95% is used for the confidence 
level requirement and (b) displays the 
ratio, (c) shows the fractional uncer- 
tainty when the r value deviates by 5%. 



Signal Scale 



The AWW approximation stays consistently below the Collie value by around 1.5% and 
follows the same trend. This test was run with a 5% rate systematic uncertainty on the 
background, which holds the ratio maximum at around 0.95. The ratio varies within a 
percent of 95%, therefore the approximation is valid. 



4.5 Varying the Number of Histogram Bins 

An inherent loss of information occurs when data is binned. Due to this, we want to test 
the ability of the AWW approximation to reproduce the level of information loss of the full 
calculation by varying the number of bins. Our results are presented in Fig. Q. 

This test was run with a 5% background-only rate systematic uncertainty. As is consistent 
with this additional uncertainty the ratio stays around 95%; the AWW approximation is valid 
in reproducing equivalent information loss. 
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Figure 9: The number of bins was varied from 100 to 1500 in 100 step increments, (a) Displays the 
signal scale required to achieve 95% confidence and (b) the ratios for both methods. 

4.6 Variation in the Number of Events 

The last test of the system we built was by varying the number of data used. We wanted to 
find how many data points were necessary in order to achieve a usable approximation. These 
results are plotted in Fig. (fTO|) . 
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Figure 10: The Collie and the approximation values for the signal strength required for 95% confi- 
dence are plotted as a function if the number of data points. The X-axis is labeled as events/50,000. 
50,000 is the lowest number of data points that returned usable values for both calculations; the 
number of iterations is equal to the number of data points, (b) shows the ratio of the two. These 
plots were generated with a 5% rate systematic uncertainty on the background. 



When the number is too small the conditions for Wilks' Theorem are not met, which inval- 
idates the AWW approximation under these conditions. This is evident on the ratio plot, 
where there is an asymptotic behavior as the number of events increases. This was applied 
with a 5% background-only rate systematic uncertainty, so the limit approaches about 0.95. 



5 Conclusion 

In summary, we tested the AWW approximation against the full semi-frequentist calculation, 
with no approximations, as calculated in Collie. We ran background-only rate systematic 
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uncertainties, background-only and signal shape systematic uncertainties, asymmetric Gaus- 
sian flat systematic uncertainties, varied the background shape itself, varied the number of 
bins, and varied the number of events. The AWW approximation behaved as expected based 
on the results from [T]. 

The tests where the model correctly reproduces the parameter values of the full calculation 
include the rate systematic uncertainties, the background and signal shape uncertainties, the 
number of histograms bins, and the uncertainty in the background shape. The shape system- 
atic uncertainties on only background, and the combined shape and background systematic 
uncertainties run at about 95% of the true value, i.e. the AWW approximation would exclude 
with 95% the signal strength required of the full calculation. When there are no systematic 
uncertainties the two methods returned nearly equal values. None of the figures for these 
tests show any absolute trend. 

The tests where the AWW model breaks down occur where expected. The first of these are 
the asymmetric Gaussian tests. In the case where the asymmetry is small, roughly at or 
below 25% difference (A=2/3), the AWW approximation and Collie agree. But when the 
difference is greater the AWW approximation fails. The second test where the model fails 
to reproduce the full calculation value is where the number of data points is varied. At low 
numbers the model fails to reproduce the full calculation, but as the number increases it 
approches an asymptotic value close to that of the full calculation. 

These results are as expected given the two approximations, Wilks and Wald, combined 
to form the new approximation, the Asimov data set, and is consistent with the report this 
paper examines. One of the conditions for Wilks' theorem is using a sufficiently large sample 
and one of the conditions for Wald's theorem is that the data uncertainties follow a Gaussian 
distribution (There are more conditions necessary to use either theorem, but these are the two 
that explain the behavior found in tests where the AWW approximation fails). In the case 
where an asymmetric Gaussian becomes non-Gaussian the model fails, as expected according 
to Wald's theorem and as the number of data points falls, the mentioned condition for Wilks's 
theorem fails (as well as increasing the neglected term in the Wald formula). 

Therefore, we conclude that when the conditional definitions of Wilks and Wald are 
met, then the approximation presented in Asymptotic formulae for likelihood-based tests of 
new physics does reproduce the full calculation reliably within 5-10%. Our results suggest 
that the approximations, published by Cowan, Cranmer, Gross, and Vitells, has the correct 
asymptotic behavior as designed. Though this approximation has limitations when any of 
the component approximations are explicitly invalidated, also as expected. 
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